Not long ago, making nice, interactive maps took lots of code or very expensive GIS software. There are several Python packages for making nice maps, and below is an example of folium (https://python-visualization.github.io/folium/), my favorite Python mapping package. This article will demonstrate:
Google Colab does include folium, but the version it uses does not support a couple handy feattures (setting popup size). Thus, we will install the latest compatable version. The package gpxpy is the easiest way to parse gpx files (one of the most common gps data formats), however, any xml parsing toolkit including python's built in parser can be used.
!pip install gpxpy
!pip install folium -U
import gpxpy
import gpxpy.gpx
import folium
import pandas as pd
from folium.plugins import HeatMap
import requests, io, zipfile
import requests, zipfile, io
import folium.plugins as plugins
from matplotlib import pyplot as plt
import numpy as np
import base64
from IPython.core.display import display
pd.set_option('display.max_columns', None)
print('Ready to Go!')
Requirement already satisfied: gpxpy in /usr/local/lib/python3.7/dist-packages (1.4.2) Requirement already up-to-date: folium in /usr/local/lib/python3.7/dist-packages (0.12.1) Requirement already satisfied, skipping upgrade: numpy in /usr/local/lib/python3.7/dist-packages (from folium) (1.19.5) Requirement already satisfied, skipping upgrade: requests in /usr/local/lib/python3.7/dist-packages (from folium) (2.23.0) Requirement already satisfied, skipping upgrade: branca>=0.3.0 in /usr/local/lib/python3.7/dist-packages (from folium) (0.4.2) Requirement already satisfied, skipping upgrade: jinja2>=2.9 in /usr/local/lib/python3.7/dist-packages (from folium) (2.11.3) Requirement already satisfied, skipping upgrade: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->folium) (2.10) Requirement already satisfied, skipping upgrade: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests->folium) (1.24.3) Requirement already satisfied, skipping upgrade: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->folium) (3.0.4) Requirement already satisfied, skipping upgrade: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->folium) (2020.12.5) Requirement already satisfied, skipping upgrade: MarkupSafe>=0.23 in /usr/local/lib/python3.7/dist-packages (from jinja2>=2.9->folium) (1.1.1) Ready to Go!
The .gpx files are zipped on my Github page. The following code grabs them.
activity_details_url = 'https://raw.githubusercontent.com/bimewok/Strava_Data_Visualization/main/data/ActivityDetails%20(2).csv'
data_url = 'https://github.com/bimewok/Strava_Data_Visualization/blob/main/data/named.zip?raw=true'
#leave blank to use current working directory. Do not end with \
path_to_save = ''
r = requests.get(data_url)
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall(path_to_save)
print('Downloaded!')
Downloaded!
In Strava's bulk data download for a user's account is a .csv showing different attributes for each activity. Unfortunately, there was no way to join the tabular and spatial information without some unpleasant preprocessing. The .gpx files were named by their non-unique activity name as shown on Strava, so I mimicked windows' file nameing scheme in Excel and added the duplication number for each file into a new feature in the tabular data (num field). Thus, as long as you know the order of the downloading, you can join the two datasets. For example:
Morning_Run.gpx, Morning_Run (1).gpx, Morning_Run (2).gpx...
We then need to clean up the activity names and add the 'num' column to create a filename feature:
def process_attributes():
activities = requests.get(activity_details_url).content
data = pd.read_csv(io.StringIO(activities.decode('utf-8')))
data['filename'] = 0
data['Activity Name']= data['Activity Name'].str.replace(' - ', '_')
data['Activity Name']= data['Activity Name'].str.replace(',', '')
data['Activity Name']= data['Activity Name'].str.replace('[!,-,?]', '_')
data['Activity Name']= data['Activity Name'].str.replace('[_-_]', '_')
data['Activity Name']= data['Activity Name'].str.replace(' ', '_')
for i in range(len(data)):
if pd.isna(data['num'][i]) == True:
data.loc[i, 'filename'] = str(data['Activity Name'][i])+'.gpx'
else:
data.loc[i, 'filename'] = str(data['Activity Name'][i])+' ('+str(int(data['num'][i]))+').gpx'
data['date'] = pd.to_datetime(data['Activity Date'])
data = data.sort_values('date').reset_index()
display(data.iloc[250:260])
return data
data = process_attributes()
| index | Activity Date | Activity Name | Activity ID | Type | Equipment | Distance | Total Time | Moving Time | Avg Speed | Max Speed | Avg Power | Elevation Gain | Avg Grade | Kudos | Comments | Photos | Riders | Commute | Trainer | Description | num | filename | date | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 250 | 338 | May 29, 2020 6:21 AM | Morning_Ride | 3530575084 | Ride | NaN | 6.12 | 01:17:16 | 01:04:22 | 5.7 | 22.1 | 162.7 | 1496 | 244.4 | 0 | 0 | 0 | 1 | NaN | NaN | NaN | 29.0 | Morning_Ride (29).gpx | 2020-05-29 06:21:00 |
| 251 | 88 | Jun 1, 2020 10:42 AM | Cache_Creek | 3550044190 | Hike | NaN | 14.77 | 05:18:25 | 04:16:21 | 3.5 | 8.7 | 0.0 | 1463 | 99.0 | 2 | 0 | 0 | 1 | NaN | NaN | NaN | NaN | Cache_Creek.gpx | 2020-06-01 10:42:00 |
| 252 | 219 | Jun 2, 2020 11:49 AM | Lunch_Ride | 3554771481 | Ride | NaN | 16.44 | 01:51:36 | 01:37:57 | 10.1 | 25.5 | 138.7 | 768 | 46.7 | 0 | 0 | 0 | 1 | NaN | NaN | NaN | 15.0 | Lunch_Ride (15).gpx | 2020-06-02 11:49:00 |
| 253 | 394 | Jun 3, 2020 7:53 AM | Warm_Springs_Day_1 | 3564836998 | Hike | NaN | 16.49 | 07:40:45 | 04:05:36 | 4.0 | 19.7 | 0.0 | 2867 | 173.8 | 1 | 0 | 2 | 1 | NaN | NaN | NaN | NaN | Warm_Springs_Day_1.gpx | 2020-06-03 07:53:00 |
| 254 | 395 | Jun 4, 2020 7:43 AM | Warm_Springs_Day_2 | 3564781014 | Hike | NaN | 17.75 | 06:29:41 | 05:32:12 | 3.2 | 7.8 | 0.0 | 2034 | 114.6 | 1 | 0 | 4 | 1 | NaN | NaN | NaN | NaN | Warm_Springs_Day_2.gpx | 2020-06-04 07:43:00 |
| 255 | 52 | Jun 5, 2020 2:53 PM | Afternoon_Ride | 3569405725 | Ride | NaN | 18.29 | 01:35:23 | 01:32:59 | 11.8 | 25.9 | 174.8 | 1086 | 59.4 | 0 | 0 | 1 | 1 | NaN | NaN | NaN | 24.0 | Afternoon_Ride (24).gpx | 2020-06-05 14:53:00 |
| 256 | 27 | Jun 7, 2020 1:05 PM | Afternoon_Hike | 3580457599 | Hike | NaN | 7.40 | 03:16:25 | 03:01:43 | 2.4 | 6.7 | 0.0 | 2054 | 277.6 | 1 | 0 | 2 | 1 | NaN | NaN | NaN | 21.0 | Afternoon_Hike (21).gpx | 2020-06-07 13:05:00 |
| 257 | 278 | Jun 8, 2020 10:54 AM | Morning_Hike | 3584653261 | Hike | NaN | 8.70 | 03:22:40 | 03:03:29 | 2.8 | 7.2 | 0.0 | 1437 | 165.2 | 1 | 0 | 3 | 1 | NaN | NaN | NaN | 20.0 | Morning_Hike (20).gpx | 2020-06-08 10:54:00 |
| 258 | 257 | Jun 9, 2020 8:43 AM | Morning_Activity | 3590588482 | Kayaking | NaN | 21.52 | 07:22:54 | 05:17:45 | 4.1 | 22.8 | 0.0 | 1473 | 68.5 | 0 | 0 | 0 | 1 | NaN | NaN | NaN | 9.0 | Morning_Activity (9).gpx | 2020-06-09 08:43:00 |
| 259 | 147 | Jun 11, 2020 9:27 AM | Little_Belt_Day_1 | 3604928770 | Hike | NaN | 14.79 | 05:45:59 | 05:13:21 | 2.8 | 6.9 | 0.0 | 2165 | 146.4 | 3 | 0 | 3 | 1 | NaN | NaN | NaN | NaN | Little_Belt_Day_1.gpx | 2020-06-11 09:27:00 |
These files contain millions of gps points, so the following parsing loop will process the gpx points for three maps is a single loop
strava_map1, built in this cell, is a simple feature location map where different activity types are symbolized by color. Additionally, I created a popup box with some important information and a png chart for the elevation profile for each activity. The other maps will be initialized in later cells using data from this parsing loop.
The loop:
strava_map1 = folium.Map(location=[46.87, -113.987], zoom_start=11, tiles='Stamen Terrain')
folium.TileLayer('Stamen Terrain').add_to(strava_map1)
folium.TileLayer('http://mt0.google.com/vt/lyrs=s&hl=en&x={x}&y={y}&z={z}',
attr='Map data ©2020 Google', name='Google Satellite').add_to(strava_map1)
feature_group1 = folium.FeatureGroup(name='Run')
feature_group2 = folium.FeatureGroup(name='Ride')
feature_group3 = folium.FeatureGroup(name='Hike')
feature_group4 = folium.FeatureGroup(name='Other')
#feature_group5 = folium.FeatureGroup(name='Heatmap')
at = 0
all_points_heatmap = []
all_points_heatmap_time = []
for i in range(len(data['filename'])):
if path_to_save != '':
content = open(path_to_save+'//'+data['filename'][i], 'r')
else:
content = open(data['filename'][i], 'r')
gpx = gpxpy.parse(content)
points = []
elev = []
points_weight = []
for track in gpx.tracks:
resampler = 0
for segment in track.segments:
for point in segment.points:
resampler += 1
if resampler % 30 == 0:
points.append(tuple([point.latitude, point.longitude]))
elev.append(point.elevation*3.28)
all_points_heatmap.append(tuple([point.latitude, point.longitude]))
if resampler % 400 == 0:
points_weight.append([point.latitude, point.longitude, 1])
fig, ax = plt.subplots()
ax.plot(np.linspace(0.0, pd.Timedelta(data['Total Time'][i]) / np.timedelta64(1, 'h'),
num=len(elev)), elev, color='tab:orange')
ax.set(xlabel='Time (Hrs.)', ylabel='Elevation (ft.)',
title='Elevation Profile')
ax.grid()
if path_to_save == '':
fig.savefig(str(data['Activity Name'][i])+".png", dpi=50)
else:
fig.savefig(path_to_save+'//'+str(data['Activity Name'][i])+".png", dpi=50)
plt.close()
if i == 0:
all_points_heatmap_time.append(points_weight)
else:
for p in all_points_heatmap_time[i-1]:
points_weight.append(p)
# new_list = [p] + points_weight
all_points_heatmap_time.append(points_weight[:int(len(all_points_heatmap)*0.03)])
tooltip = "Name:{}<br> Date: {}<br> Click for more".format(data['Activity Name'][i]+' - '+data['Type'][i],
data['Activity Date'][i])
encoded = base64.b64encode(open(str(data['Activity Name'][i])+".png", 'rb').read())
html = """<i>Distance: </i> <br> <b>{}</b> <br>
<i>Elevation Gain: </i><b><br>{}</b><br>
<i>Moving Time: </i><b><br>{}</b><br>
<i>Link: </i><b><br><a href={} target="_blank">Strava Link</a><br>
<img src="data:image/png;base64,{}">""".format(str(data['Distance'][i])+' (mi.)',
str(data['Elevation Gain'][i])+' (ft.)', data['Moving Time'][i],
'https://www.strava.com/activities/'+str(data['Activity ID'][i]),
encoded.decode('UTF-8'))
iframe = folium.IFrame(html)
popup = folium.Popup(iframe,
min_width=350,
max_width=350)
if data['Type'][i] == 'Run':
folium.PolyLine(points, color="orange", weight=2.5, opacity=1, popup=popup, tooltip=tooltip).add_to(feature_group1)
elif data['Type'][i] == 'Ride':
folium.PolyLine(points, color="blue", weight=2.5, opacity=1, popup=popup, tooltip=tooltip).add_to(feature_group2)
elif data['Type'][i] == 'Hike':
folium.PolyLine(points, color="red", weight=2.5, opacity=1, popup=popup, tooltip=tooltip).add_to(feature_group3)
else:
folium.PolyLine(points, color="purple", weight=2.5, opacity=1, popup=popup, tooltip=tooltip).add_to(feature_group4)
at += 1
if at % 30 ==0:
if len(data)-at >100:
print('finished: ', str(at), ' files, ', len(data)-at,' to go :(')
else:
print('finished: ', str(at), ' files, ', len(data)-at,' to go :)')
strava_map1.add_child(feature_group1)
strava_map1.add_child(feature_group2)
strava_map1.add_child(feature_group3)
strava_map1.add_child(feature_group4)
strava_map1.add_child(folium.map.LayerControl('topright', collapsed= False))
if path_to_save == '':
strava_map1.save('1.html')
else:
strava_map1.save(path_to_save+'\\1.html')
strava_map1
finished: 30 files, 368 to go :( finished: 60 files, 338 to go :( finished: 90 files, 308 to go :( finished: 120 files, 278 to go :( finished: 150 files, 248 to go :( finished: 180 files, 218 to go :( finished: 210 files, 188 to go :( finished: 240 files, 158 to go :( finished: 270 files, 128 to go :( finished: 300 files, 98 to go :) finished: 330 files, 68 to go :) finished: 360 files, 38 to go :) finished: 390 files, 8 to go :)